Can Document-Genre Metadata Improve Information Access to Large Digital Collections?

نویسندگان

  • Kevin Crowston
  • Barbara H. Kwasnik
چکیده

We discuss the issues of resolving the information-retrieval problem in large digital collections through the identification and use of document genres. Explicit identification of genre seems particularly important for such collections because any search usually retrieves documents with a diversity of genres that are undifferentiated by obvious clues as to their identity. Also, because most genres are characterized by both form and purpose, identifying the genre of a document provides information as to the document’s purpose and its fit to the user’s situation, which can be otherwise difficult to assess. We begin by outlining the possible role of genre identification in the information-retrieval process. Our assumption is that genre identification would enhance searching, first because we know that topic alone is not enough to define an information problem and, second, because search results containing genre information would be more easily understandable. Next, we discuss how information professionals have traditionally tackled the issues of representing genre in settings where topical representation is the norm. Finally, we address the issues of studying the efficacy of identifying genre in large digital collections. Because genre is often an implicit notion, studying it in a systematic way presents many problems. We outline a research protocol that would provide guidance for identifying Web document genres, for observing how genre is used in searching and evaluating search results, and finally for representing and visualizing genres. Kevin Crowston and Barbara H. Kwasnik, Syracuse University School of Information Studies, 4–206 Centre for Science and Technology, Syracuse, NY 13244–4100 LIBRARY TRENDS, Vol. 52, No. 2, Fall 2003, pp. 345–361 © 2003 The Board of Trustees, University of Illinois Introduction Current computerized information-access systems face a fundamental limitation: they know what documents say but not what they mean or for what purposes they might be useful. Extracting and representing the meaning of documents is difficult and time consuming, and automatic systems still have significant limitations. We note, though, that humans rarely have to read every word of a document to understand its purpose. Instead, people take a shortcut: they start by identifying the kinds of documents they are faced with (i.e., the document’s genre), and then use different types of documents in appropriate ways. For example, a grant proposal is used differently from a syllabus, a product brochure, or a bank statement. Accordingly, differences in an information situation are often reflected in the kind of document that is considered helpful (e.g., a problem set, a lesson plan, and a tutorial about mathematics are all about math but useful in different situations). Information-access systems would be more useful for many tasks if they could similarly distinguish the purpose of documents and handle them in appropriate ways. In this paper we discuss the possibility of improving information access in large digital collections through the identification and use of document genre as a facet of document and query representation. First, we provide some historical background on the concept of genre and the approach it provides to the problem of incorporating context into information retrieval. We outline the framework of the information-retrieval problem with respect to genre and some traditional resolutions that have been attempted. Finally, we outline a research agenda that addresses some of the questions and issues that investigating genre entails. Theory: Document Genre Rhetoricians since Aristotle have attempted to classify communications with similar form or purpose into types or “genres.” Numerous definitions of genre, or discourse type, have been suggested (e.g., Longacre, 1983; Miller, 1984; Swales, 1990). In our discussion, we draw on the definition of genre proposed by Orlikowski and Yates (1994), who describe genre as “a distinctive type of communicative action, characterized by a socially recognized communicative purpose and common aspects of form” (p. 543). For instance, this document is an example of the journal article genre. It has a form familiar to most researchers and practitioners and is monitored by the journal’s editorial policies as well as the profession’s communication practices. There are many document genres: some common, such as a report or a newsletter, and others restricted to specific domains, such as the course syllabus or a problem set in higher education. Genre is applicable to electronic as well as physical documents. For example, in a study of Web documents, Crowston and Williams (2000) were able to identify documents of many familiar genres and of a few genres that seemed to 346 library trends/fall 2003

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Metadata Approach to Manage and Organize Electronic Documents and Collections on the Web

In recent years, the number of information sources offered on the Web has grown tremendously. Support for accessing these information sources has mostly been concentrated on browsing and search tools. Digital libraries and Web directories constitute important initiatives to improve information access, creating and organizing document collections hierarchically, according to different criteria. ...

متن کامل

Genre Classification in Automated Ingest and Appraisal Metadata

Metadata creation is a crucial aspect of the ingest of digital materials into digital libraries. Metadata needed to document and manage digital materials are extensive and manual creation of them expensive. The Digital Curation Centre (DCC) has undertaken research to automate this process for some classes of digital material. We have segmented the problem and this paper discusses results in gen...

متن کامل

Personalized information spaces: improved access to chemical digital libraries

Today digital libraries provide access to a vast, but largely unstructured, amount of document collections. Facing the ever increasing challenge of the information overload content providers have to focus on new ways in user-centered retrieval, not only providing tools for searching information, but tools for personalizing, managing, evaluating and working with the returned search results. With...

متن کامل

Jyväskylä Studies in Computing 37 Contextual and Structural Metadata in Enterprise Document Management Contextual and Structural Metadata in Enterprise Document Management Jyväskylä Studies in Computing 37 Contextual and Structural Metadata in Enterprise Document Management University of Jyväskylä

Lyytikäinen, Virpi Contextual and structural metadata in enterprise document management Documents have a central role in organizations. While the amount of information continually increases, new kinds of methods for managing the documents are needed. Enterprise document management concerns the whole life cycle of documents in organizations, from emergence to disposition, and also development of...

متن کامل

Automatic Keyword Extraction for Learning Object Repositories

Introduction Learning object repositories are digital collections of educational materials, e.g., lectures, notes, presentations, which can be used to support learning. The main purpose of such repositories is to improve the sharing and reusability of the learning objects, which can be defined as “any digital resource that can be reused to support learning” (Wiley, 2000, p. 7). An important asp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Library Trends

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2003